Cancer Score for Assessment and Response Prediction from Biological Fluids

ABSTRACT

Methods for analyzing omics data and using the omics data to determine prognosis of a cancer, to predict an outcome of a treatment, and/or to determine an effectiveness of a treatment are presented. In preferred methods, blood from a patient having a cancer or suspected to have a cancer is obtained and blood omics data for a plurality of cancer-related, inflammation-related, or DNA repair-related genes are obtained. A cancer score can be calculated based on the omics data, which then can be used to provide a cancer prognosis, a therapeutic recommendation, an effectiveness of a treatment.

This application is a continuation application of allowed US applicationhaving Ser. No. 16/754,088, which was filed Apr. 6, 2020, and which is a371 application of PCT/US2018/055481, which was filed Oct. 11, 2018, andwhich claims priority to US provisional application having the Ser. No.62/571,414, filed Oct. 12, 2017, all of which are incorporated byreference in their entirety herein.

FIELD OF THE INVENTION

The field of the invention is profiling of omics data as they relate tocancer, especially as it relates to the generation of indicators forcancer prognosis, prediction of treatment outcomes, and/or effectivenessof cancer treatments.

BACKGROUND OF THE INVENTION

The background description includes information that may be useful inunderstanding the present invention. It is not an admission that any ofthe information provided herein is prior art or relevant to thepresently claimed invention, or that any publication specifically orimplicitly referenced is prior art.

All publications and patent applications herein are incorporated byreference to the same extent as if each individual publication or patentapplication were specifically and individually indicated to beincorporated by reference. Where a definition or use of a term in anincorporated reference is inconsistent or contrary to the definition ofthat term provided herein, the definition of that term provided hereinapplies and the definition of that term in the reference does not apply.

Cancer is a multifactorial disease where many diverse genetic andenvironmental factors interplay and contribute to the development andoutcome of the disease. In addition, genetic and environmental factorsoften affect the patient's prognosis in various degrees such thatindividual patients may show different responses to the same therapeuticand/or prophylactic treatment. Such complexity and diversity rendertraditional prediction of prognosis, identification of optimaltreatments, and prediction of likelihood of success of the treatmentsbased on a single or few factors (e.g., serum level ofinflammation-related proteins, etc.), often unreliable. Further, manytraditional methods of examining such factors are invasive as theyrequire tumor biopsy samples for histology of tumor cells and tissues.

More recently, DNA or RNA populations present in the peripheral bloodhave drawn attention for analyzing genetic abnormalities associated withthe cancer status. For example, U.S. Pat. No. 9,422,592 discloses themeasurement of cell free RNA (cfRNA) of formulpeptide receptor gene(FPR1) and its association with the patient's risk for having lungcancer or non-small cell lung cancer (NSCLC). Yet, such studies arelimited to a few numbers of genes, which are typically weighed equallyin determining the cancer status. As multiple factors affect to variousdegrees prognosis of most cancers, oversimplification may causeinaccurate prognosis and/or prediction of treatment outcome.

Thus, even though some examples of using cell free nucleic acid indetermining cancer status are known, differentially weighed,multi-factor approaches in determining cancer status using cell freenucleic acid are largely unexplored. Thus, there remains a need forimproved methods of analyzing omics data of cell free nucleic acids indetermining status, prognosis of a cancer as well as likelihood oftreatment outcome or effectiveness of the treatment.

SUMMARY OF THE INVENTION

The inventive subject matter is directed to methods of using variousomics data of cell free nucleic acids to calculate a composite cancerscore that can be used to determine the status, prognosis of a cancer aswell as likelihood of treatment outcome and/or effectiveness of currenttreatments. Thus, one aspect of the subject matter includes a method ofanalyzing omics data. In this method, blood is obtained from a patienthaving or suspected to have a cancer. From the blood, omics data for aplurality of cancer-related genes are obtained. Most preferably, theomics data include at least one of DNA sequence data, RNA sequence data,and RNA expression level data. From the omics data, a composite score iscalculated which can then be associated with at least one of a healthstatus, an omics error status, a cancer prognosis, a therapeuticrecommendation, and an effectiveness of a treatment.

In some embodiments, the DNA sequence data v selected from the groupconsisting of mutation data, copy number data duplication, loss ofheterozygosity data, and epigenetic status. Optionally, the DNA sequencedata is obtained from circulating free DNA. In other embodiments, theRNA sequence data is selected from the group consisting of mRNA sequencedata and splice variant data, and/or the RNA expression level data isselected from the group consisting of a quantity of RNA transcript and aquantity of a small noncoding RNA. Optionally, the RNA sequence data isobtained from the group consisting of circulating tumor RNA andcirculating free RNA.

Typically, the plurality of cancer-related genes comprises at least oneof a cancer-related gene, a cancer-specific gene, a DNA-repair gene, aneoepitope, and a gene not associated with a disease. Preferably, theneoepitope is tumor-specific and patient-specific. In some embodiments,the plurality of cancer-related genes includes a cancer-specific gene,and the score is calculated based on a presence or an absence of amutation in the cancer-specific gene. In such embodiments, it ispreferred that the presence of the mutation in the cancer-specific geneweighs more than the presence of the mutation in the cancer-relatedgenes other than the cancer-specific gene. In other embodiments, thescore is calculated based on a type of a splice variant of the cancergene or a ratio between or among a plurality of splice variants of thecancer gene.

In some embodiments, the method further comprises a step of comparingthe score with a threshold value to thereby determine the therapeuticrecommendation. In such embodiments, it is preferred that thetherapeutic recommendation is a prophylactic treatment if the score isbelow the threshold value. Alternatively and/or additionally, the methodfurther comprises a step of comparing the omics error status with athreshold value to thereby determine a risk score.

In another aspect of the inventive subject matter, the inventorscontemplate a method of determining prognosis of a cancer of a patient.In this method, blood is obtained from a patient having or suspected tohave a cancer. From the blood, omics data for a plurality of cancergenes are obtained. Preferably, the omics data include at least one ofDNA sequence data, RNA sequence data, and RNA expression level data.From the omics data, a cancer prognosis score is calculated, and theprognosis of the cancer is provided based on the cancer prognosis score.IN some embodiments, the prognosis comprises a progress of metastasis.

In some embodiments, the DNA sequence data v selected from the groupconsisting of mutation data, copy number data duplication, loss ofheterozygosity data, and epigenetic status. Optionally, the DNA sequencedata is obtained from circulating free DNA. In other embodiments, theRNA sequence data is selected from the group consisting of mRNA sequencedata and splice variant data, and/or the RNA expression level data isselected from the group consisting of a quantity of RNA transcript and aquantity of a small noncoding RNA. Optionally, the RNA sequence data isobtained from the group consisting of circulating tumor RNA andcirculating free RNA.

Typically, the plurality of cancer-related genes comprises at least oneof a cancer-related gene, a cancer-specific gene, a DNA-repair gene, aneoepitope, and a gene not associated with a disease. Preferably, theneoepitope is tumor-specific and patient-specific. In some embodiments,the plurality of cancer-related genes includes a cancer-specific gene,and the score is calculated based on a presence or an absence of amutation in the cancer-specific gene. In other embodiments, the score iscalculated based on a type of a splice variant of the cancer gene or aratio among or between a plurality of splice variants of the cancergene.

In some embodiments, the omics data is a plurality of sets of omics dataobtained at a different time points during a time period, and theprognosis is provided based on a plurality of scores from the pluralityof sets of omics data. In such embodiments, it is preferred that theprognosis is represented by a change of a plurality of scores during thetime period, wherein the change is over a predetermined threshold value.

Still another aspect of inventive subject matter is directed towards amethod of predicting an outcome of a treatment for a cancer patient. Inthis method, blood is obtained from a patient having a cancer. From theblood, omics data for a plurality of cancer genes are obtained.Preferably, the omics data include at least one of DNA sequence data,RNA sequence data, and RNA expression level data. From the omics data, acancer gene score is calculated, and a predicted outcome of thetreatment is provided based on the cancer prognosis score. Preferably,the predicted outcome is determined by comparing the cancer gene scorewith a predetermined threshold value.

In some embodiments, the treatment is a drug, and at least one of theplurality of cancer gene is a predicted target of the drug. In otherembodiments, the treatment is an immune therapy, and at least one of theplurality of cancer gene is a receptor of an immune cell or a ligand ofthe receptor. In still other embodiments, the treatment is a surgery ora radiation therapy, and at least one of the plurality of cancer gene isa neoepitope that is tumor-specific and patient-specific.

In some embodiments, the DNA sequence data v selected from the groupconsisting of mutation data, copy number data duplication, loss ofheterozygosity data, and epigenetic status. Optionally, the DNA sequencedata is obtained from circulating free DNA. In other embodiments, theRNA sequence data is selected from the group consisting of mRNA sequencedata and splice variant data, and/or the RNA expression level data isselected from the group consisting of a quantity of RNA transcript and aquantity of a small noncoding RNA. Optionally, the RNA sequence data isobtained from the group consisting of circulating tumor RNA andcirculating free RNA.

Typically, the plurality of cancer-related genes comprises at least oneof a cancer-related gene, a cancer-specific gene, a DNA-repair gene, aneoepitope, and a gene not associated with a disease. Preferably, theneoepitope is tumor-specific and patient-specific. In some embodiments,the plurality of cancer-related genes includes a cancer-specific gene,and the score is calculated based on a presence or an absence of amutation in the cancer-specific gene. In other embodiments, the score iscalculated based on a type of a splice variant of the cancer gene or aratio between a plurality of splice variants of the cancer gene.

In still another aspect of the inventive subject matter, the inventorscontemplate a method of evaluating an effectiveness of a treatment for acancer patient. In this method, blood is obtained from a patient havinga cancer. From the blood, omics data for a plurality of cancer genes areobtained before and after the treatment. Preferably, the omics datainclude at least one of DNA sequence data, RNA sequence data, and RNAexpression level data. From the omics data, at least two cancer genescores corresponding to the omics data before and after the treatment,respectively, are generated, and the effectiveness of the treatment isprovided based on the comparison of the at least two cancer gene scores.In some embodiments, the effectiveness of the treatment can bedetermined by a difference between the cancer gene score before andafter the treatment. In such embodiments, it is preferred that thetreatment is determined effective when the difference is higher than apredetermined threshold value.

In some embodiments, the treatment is a drug, and at least one of theplurality of cancer gene is a predicted target of the drug. In otherembodiments, the treatment is an immune therapy, and at least one of theplurality of cancer gene is a receptor of an immune cell or a ligand ofthe receptor. In still other embodiments, the treatment is a surgery ora radiation therapy, and at least one of the plurality of cancer gene isa neoepitope that is tumor-specific and patient-specific.

In some embodiments, the DNA sequence data v selected from the groupconsisting of mutation data, copy number data duplication, loss ofheterozygosity data, and epigenetic status. Optionally, the DNA sequencedata is obtained from circulating free DNA. In other embodiments, theRNA sequence data is selected from the group consisting of mRNA sequencedata and splice variant data, and/or the RNA expression level data isselected from the group consisting of a quantity of RNA transcript and aquantity of a small noncoding RNA. Optionally, the RNA sequence data isobtained from the group consisting of circulating tumor RNA andcirculating free RNA.

Typically, the plurality of cancer-related genes comprises at least oneof a cancer-related gene, a cancer-specific gene, a DNA-repair gene, aneoepitope, and a gene not associated with a disease. Preferably, theneoepitope is tumor-specific and patient-specific. In some embodiments,the plurality of cancer-related genes includes a cancer-specific gene,and the score is calculated based on a presence or an absence of amutation in the cancer-specific gene. In other embodiments, the score iscalculated based on a type of a splice variant of the cancer gene or aratio between a plurality of splice variants of the cancer gene.

Various objects, features, aspects and advantages of the inventivesubject matter will become more apparent from the following detaileddescription of preferred embodiments.

DETAILED DESCRIPTION

The inventors discovered that the status and/or prognosis of a cancercan be more reliably determined in a less invasive and quick mannerusing a compound score that is generated based on multiple factorsassociated with the cancer. The inventors also discovered that thecompound score can be used to reliably predict a likelihood of outcomeof a cancer treatment, and further, effectiveness of a particular cancertreatment. Viewed from a different perspective, the inventors discoveredthat a compound score can be generated from the patient's omics dataobtained from nucleic acids in the patient's blood. Typically the omicsdata include omics data of various cancer-related genes, which can bedifferentially weighed based on the type and timing of the sampling. Thecompound score can be a reliable indicator to determine cancer statusand/or prognosis of a cancer, a likelihood of outcome of a cancertreatment. Further, the compound scores generated based on omics dataobtained before and after a cancer treatment can be compared todetermine the effectiveness of a cancer treatment.

As used herein, the term “tumor” refers to, and is interchangeably usedwith one or more cancer cells, cancer tissues, malignant tumor cells, ormalignant tumor tissue, that can be placed or found in one or moreanatomical locations in a human body.

It should be noted that the term “patient” as used herein includes bothindividuals that are diagnosed with a condition (e.g., cancer) as wellas individuals undergoing examination and/or testing for the purpose ofdetecting or identifying a condition. Thus, a patient having a tumorrefers to both individuals that are diagnosed with a cancer as well asindividuals that are suspected to have a cancer.

As used herein, the term “provide” or “providing” refers to and includesany acts of manufacturing, generating, placing, enabling to use,transferring, or making ready to use.

Cell-Free DNA/RNA

The inventors contemplate that tumor cells and/or some immune cellsinteracting or surrounding the tumor cells release cell free DNA/RNA tothe patient's bodily fluid, and thus may increase the quantity of thespecific cell free DNA/RNA in the patient's bodily fluid as compared toa healthy individual. As used herein, the patient's bodily fluidincludes, but is not limited to, blood, serum, plasma, mucus,cerebrospinal fluid, ascites fluid, saliva, and urine of the patient.Alternatively, it should be noted that various other bodily fluids arealso deemed appropriate so long as cell free DNA/RNA is present in suchfluids. The patient's bodily fluid may be fresh or preserved/frozen.Appropriate fluids include saliva, ascites fluid, spinal fluid, urine,etc., which may be fresh or preserved/frozen.

The cell free RNA may include any types of DNA/RNA that are circulatingin the bodily fluid of a person without being enclosed in a cell body ora nucleus. Most typically, the source of the cell free DNA/RNA is thetumor cells. However, it is also contemplated that the source of thecell free DNA/RNA is an immune cell (e.g., NK cells, T cells,macrophages, etc.). Thus, the cell free DNA/RNA can be circulating tumorDNA/RNA (ctDNA/RNA) and/or circulating free DNA/RNA (cf DNA/RNA,circulating nucleic acids that do not derive from a tumor). While notwishing to be bound by a particular theory, it is contemplated thatrelease of cell free DNA/RNA originating from a tumor cell can beincreased when the tumor cell interacts with an immune cell or when thetumor cells undergo cell death (e.g., necrosis, apoptosis, autophagy,etc.). Thus, in some embodiments, the cell free DNA/RNA may be enclosedin a vesicular structure (e.g., via exosomal release of cytoplasmicsubstances) so that it can be protected from nuclease (e.g., RNAase)activity in some type of bodily fluid. Yet, it is also contemplated thatin other aspects, the cell free DNA/RNA is a naked DNA/RNA without beingenclosed in any membranous structure, but may be in a stable form byitself or be stabilized via interaction with one or more non-nucleotidemolecules (e.g., any RNA binding proteins, etc.).

It is contemplated that the cell free DNA/RNA can be any type of DNA/RNAwhich can be released from either cancer cells or immune cell. Thus, thecell free DNA may include any whole or fragmented genomic DNA, ormitochondrial DNA, and the cell free RNA may include mRNA, tRNA,microRNA, small interfering RNA, long non-coding RNA (lncRNA). Mosttypically, the cell free DNA is a fragmented DNA typically with a lengthof at least 50 base pair (bp), 100 base pair (bp), 200 bp, 500 bp, or 1kbp. Also, it is contemplated that the cell free RNA is a full length ora fragment of mRNA (e.g., at least 70% of full-length, at least 50% offull length, at least 30% of full length, etc.). While cell free DNA/RNAmay include any type of DNA/RNA encoding any cellular, extracellularproteins or non-protein elements, it is preferred that at least some ofcell free DNA/RNA encodes one or more cancer-related proteins, orinflammation-related proteins. For example, the cell free DNA/mRNA maybe full-length or fragments of (or derived from the) cancer relatedgenes including, but not limited to ABL1, ABL2, ACTB, ACVR1B, AKT1,AKT2, AKT3, ALK, AMER11, APC, AR, ARAF, ARFRP1, ARID1A, ARID1B, ASXL1,ATF1, ATM, ATR, ATRX, AURKA, AURKB, AXIN1, AXL, BAP1, BARD1, BCL2,BCL2L1, BCL2L2, BCL6, BCOR, BCORL1, BLM, BMPR1A, BRAF, BRCA1, BRCA2,BRD4, BRIP1, BTG1, BTK, EMSY, CARD11, CBFB, CBL, CCND1, CCND2, CCND3,CCNE1, CD274, CD79A, CD79B, CDC73, CDH1, CDK12, CDK4, CDK6, CDK8,CDKN1A, CDKN1B, CDKN2A, CDKN2B, CDKN2C, CEA, CEBPA, CHD2, CHD4, CHEK1,CHEK2, CIC, CREBBP, CRKL, CRLF2, CSF1R, CTCF, CTLA4, CTNNA1, CTNNB1,CUL3, CYLD, DAXX, DDR2, DEPTOR, DICER1, DNMT3A, DOT1L, EGFR, EP300,EPCAM, EPHA3, EPHA5, EPHA7, EPHB1, ERBB2, ERBB3, ERBB4, EREG, ERG,ERRFI1, ESR1, EWSR1, EZH2, FAM46C, FANCA, FANCC, FANCD2, FANCE, FANCF,FANCG, FANCL, FAS, FAT1, FBXW7, FGF10, FGF14, FGF19, FGF23, FGF3, FGF4,FGF6, FGFR1, FGFR2, FGFR3, FGFR4, FH, FLCN, FLI1, FLT1, FLT3, FLT4,FOLH1, FOXL2, FOXP1, FRS2, FUBP1, GABRA6, GATA1, GATA2, GATA3, GATA4,GATA6, GID4, GLI1, GNA11, GNA13, GNAQ, GNAS, GPR124, GRIN2A, GRM3,GSK3B, H3F3A, HAVCR2, HGF, HMGB1, HMGB2, HMGB3, HNF1A, HRAS, HSD3B1,HSP90AA1, IDH1, IDH2, IDO, IGF1R, IGF2, IKBKE, IKZF1, IL7R, INHBA,INPP4B, IRF2, IRF4, IRS2, JAK1, JAK2, JAK3, JUN, MYST3, KDM5A, KDM5C,KDM6A, KDR, KEAP, KEL, KIT, KLHL6, KLK3, MLL, MLL2, MLL3, KRAS, LAG3,LMO1, LRP1B, LYN, LZTR1, MAGI2, MAP2K1, MAP2K2, MAP2K4, MAP3K1, MCL1,MDM2, MDM4, MED12, MEF2B, MEN1, MET, MITF, MLH1, MPL, MRE11A, MSH2,MSH6, MTOR, MUC1, MUTYH, MYC, MYCL, MYCN, MYD88, MYH, NF1, NF2, NFE2L2,NFKB1A, NKX2-1, NOTCH1, NOTCH2, NOTCH3, NPM1, NRAS, NSD1, NTRK1, NTRK2,NTRK3, NUP93, PAK3, PALB2, PARK2, PAX3, PAX, PBRM1, PDGFRA, PDCD1,PDCD1LG2, PDGFRB, PDK1, PGR, PIK3C2B, PIK3CA, PIK3CB, PIK3CG, PIK3R1,PIK3R2, PLCG2, PMS2, POLD1, POLE, PPP2R1A, PREX2, PRKAR1A, PRKC1, PRKDC,PRSS8, PTCH1, PTEN, PTPN11, QK1, RAC1, RAD50, RAD51, RAF1, RANBP1, RARA,RB1, RBM10, RET, RICTOR, RIT1, RNF43, ROS1, RPTOR, RUNX1, RUNX1T1, SDHA,SDHB, SDHC, SDHD, SETD2, SF3B1, SLIT2, SMAD2, SMAD3, SMAD4, SMARCA4,SMARCB1, SMO, SNCAIP, SOCS1, SOX10, SOX2, SOX9, SPEN, SPOP, SPTA1, SRC,STAG2, STAT3, STAT4, STK11, SUFU, SYK, T (BRACHYURY), TAF1, TBX3, TERC,TERT, TET2, TGFRB2, TNFAIP3, TNFRSF14, TOP1, TOP2A, TP53, TSC1, TSC2,TSHR, U2AF1, VEGFA, VHL, WISP3, WT1, XPO1, ZBTB2, ZNF217, ZNF703, CD26,CD49F, CD44, CD49F, CD13, CD15, CD29, CD151, CD138, CD166, CD133, CD45,CD90, CD24, CD44, CD38, CD47, CD96, CD 45, CD90, ABCB5, ABCG2, ALCAM,ALPHA-FETOPROTEIN, DLL1, DLL3, DLL4, ENDOGLIN, GJA1, OVASTACIN, AMACR,NESTIN, STRO-1, MICL, ALDH, BMI-1, GLI-2, CXCR1, CXCR2, CX3CR1, CX3CL1,CXCR4, PON1, TROP1, LGR5, MSI-1, C-MAF, TNFRSF7, TNFRSF16, SOX2,PODOPLANIN, L1CAM, HIF-2 ALPHA, TFRC, ERCC1, TUBB3, TOP1, TOP2A, TOP2B,ENOX2, TYMP, TYMS, FOLR1, GPNMB, PAPPA, GART, EBNA1, EBNA2, LMP1, BAGE,BAGE2, BCMA, C10ORF54, CD4, CD8, CD19, CD20, CD25, CD30, CD33, CD80,CD86, CD123, CD276, CCL1, CCL2, CCL3, CCL4, CCL5, CCL7, CCL8, CCL11,CCL13, CCL14, CCL15, CCL16, CCL17, CCL18, CCL19, CCL20, CCL21, CCL22,CCL23, CCL24, CCL25, CCL26, CCL27, CCL28, CCR1, CCR2, CCR3, CCR4, CCR5,CCR6, CCR7, CCR8, CCR9, CCR10, CXCL1, CXCL2, CXCL3, CXCL5, CXCL6, CXCL9,CXCL10, CXCL11, CXCL12, CXCL13, CXCL14, CXCL16, CXCL17, CXCR3, CXCR5,CXCR6, CTAG1B, CTAG2, CTAG1, CTAG4, CTAG5, CTAG6, CTAG9, CAGE1, GAGE1,GAGE2A, GAGE2B, GAGE2C, GAGE2D, GAGE2E, GAGE4, GAGE10, GAGE12D, GAGE12F,GAGE12J, GAGE13, HHLA2, ICOSLG, LAG1, MAGEA10, MAGEA12, MAGEA1, MAGEA2,MAGEA3, MAGEA4, MAGEA4, MAGEA5, MAGEA6, MAGEA7, MAGEA8, MAGEA9, MAGEB1,MAGEB2, MAGEB3, MAGEB4, MAGEB6, MAGEB10, MAGEB16, MAGEB18, MAGEC1,MAGEC2, MAGEC3, MAGED1, MAGED2, MAGED4, MAGED4B, MAGEE1, MAGEE2, MAGEF1,MAGEH1, MAGEL2, NCR3LG1, SLAMF7, SPAG1, SPAG4, SPAG5, SPAG6, SPAG7,SPAG8, SPAG9, SPAG11A, SPAG11B, SPAG16, SPAG17, VTCN1, XAGE1D, XAGE2,XAGE3, XAGE5, XCL1, XCL2, and XCR1. Of course, it should be appreciatedthat the above genes may be wild type or mutated versions, includingmissense or nonsense mutations, insertions, deletions, fusions, and/ortranslocations, all of which may or may not cause formation offull-length mRNA when transcribed.

For another example, some cell free DNAs/mRNAs are fragments of or thoseencoding a full length or a fragment of inflammation-related proteins,including, but not limited to, HMGB1, HMGB2, HMGB3, MUC1, VWF, MMP, CRP,PBEF1, TNF-α, TGF-β, PDGFA, IL-1, IL-2, IL-3, IL-4, IL-5, IL-6, IL-7,IL-8, IL-9, IL-10, IL-12, IL-13, IL-15, IL-17, Eotaxin, FGF, G-CSF,GM-CSF, IFN-7, IP-10, MCP-1, PDGF, and hTERT, and in yet anotherexample, the cell free mRNA encoded a full length or a fragment ofHMGB1.

For still another example, some cell free DNAs/mRNAs are fragments of orthose encoding a full length or a fragment of DNA repair-relatedproteins or RNA repair-related proteins. Table 1 provides an exemplarycollection of predominant RNA repair genes and their associated repairpathways contemplated herein, but it should be recognized that numerousother genes associated with DNA repair and repair pathways are alsoexpressly contemplated herein, and Tables 2 and 3 illustrate furtherexemplary genes for analysis and their associated function in DNArepair.

TABLE 1 Repair mechanism Predominant DNA Repair genes Base excision DNAglycosylase, APE1, XRCC1, PNKP, Tdp1, repair (BER) APTX, DNA polymeraseβ, FEN1, DNA polymerase δ or ε, PCNA-RFC, PARP Mismatch repair MutSα(MSH2-MSH6), Mutsβ (MSH2-MSH3), (MMR) MutLα (MLH1-PMS2), MutLβ(MLH1-PMS2), MutLγ (MLH1-MLH3), Exo1, PCNA-RFC NucleotideXPC-Rad23B-CEN2, UV-DDB (DDB1-XPE), CSA, excision CSB, TFIIH, XPB, XPD,XPA, RPA, XPG, repair (NER) ERCC1- XPF, DNA polymerase δ or ε HomologousMre11-Rad50-Nbs1, CtIP, RPA, Rad51, Rad52, recombination BRCA1, BRCA2,Exo1, BLM-TopIIIα, (HR) GEN1-Yen1, Slx1-Slx4, Mus81/Eme1 Non-homologousKu70-Ku80, DNA-PKc, XRCC4-DNA ligase IV, end-joining XLF (NHEJ)

TABLE 2 Accession Gene name (synonyms) Activity number Base excisionrepair (BER) DNA glycosylases: major altered base released UNG Uexcision NM_003362 SMUG1 U excision NM_014311 MBD4 U or T opposite G atCpG sequences NM_003925 TDG U, T or ethenoC opposite G NM_003211 OGG18-oxoG opposite C NM_002542 MYH A opposite 8-oxoG NM_012222 NTH1Ring-saturated or fragmented NM_002528 pyrimidines MPG 3-meA, ethenoA,hypoxanthine NM_002434 Other BER factors APE1 (HAP1, APEX, REF1) APendonuclease NM_001641 APE2 (APEXL2) AP endonuclease NM_014481 LIG3 Mainligation function NM_013975 XRCC1 Main ligation function NM_006297Poly(ADP-ribose) polymerase (PARP) enzymes ADPRT Protects strandinterruptions NM_001618 ADPRTL2 PARP-like enzyme NM_005485 ADPRTL3PARP-like enzyme AF085734 Direct reversal of damage MGMT O6-meGalkyltransferase NM_002412 Mismatch excision repair (MMR) MSH2 Mismatchand loop recognition NM_000251 MSH3 Mismatch and loop recognitionNM_002439 MSH6 Mismatch recognition NM_000179 MSH4 MutS homologspecialized for meiosis NM_002440 MSH5 MutS homolog specialized formeiosis NM_002441 PMS1 Mitochondrial MutL homolog NM_000534 MLH1 MutLhomolog NM_000249 PMS2 MutL homolog NM_000535 MLH3 MutL homolog ofunknown function NM_014381 PMS2L3 MutL homolog of unknown functionD38437 PMS2L4 MutL homolog of unknown function D38438 Nucleotideexcision repair (NER) XPC Binds damaged DNA as complex NM_004628 RAD23B(HR23B) Binds damaged DNA as complex NM_002874 CETN2 Binds damaged DNAas complex NM_004344 RAD23A (HR23A) Substitutes for HR23B NM_005053 χPABinds damaged DNA in preincisioncomplex NM_000380 RPA1 Binds DNA inpreincision complex NM_002945 RPA2 Binds DNA in preincision complexNM_002946 RPA3 Binds DNA in preincision complex NM_002947 TFIIHCatalyzes unwinding in preincisioncomplex XPB (ERCC3) 3′ to 5′ DNAhelicase NM_000122 XPD (ERCC2) 5′ to 3′ DNA helicase X52221 GTF2H1 CoreTFIIH subunit p62 NM_005316 GTF2H2 Core TFIIH subunit p44 NM_001515GTF2H3 Core TFIIH subunit p34 NM_001516 GTF2H4 Core TFIIH subunit p52NM_001517 CDK7 Kinase subunit of TFIIH NM_001799 CCNH Kinase subunit ofTFIIH NM_001239 MNAT1 Kinase subunit of TFIIH NM_002431 XPG (ERCC5) 3′incision NM_000123 ERCC1 5′ incision subunit NM_001983 XPF (ERCC4) 5′incision subunit NM_005236 LIG1 DNA joining NM_000234 NER-related CSA(CKN1) Cockayne syndrome; needed for NM_000082 transcription-coupled NERCSB (ERCC6) Cockayne syndrome; needed for NM_000124transcription-coupled NER XAB2 (HCNP) Cockayne syndrome; needed forNM_020196 transcription-coupled NER DDB1 Complex defective in XP group ENM_001923 DDB2 Mutated in XP group E NM_000107 MMS19 Transcription andNER AW852889 Homologous recombination RAD51 Homologous pairing NM_002875RAD51L1 (RAD51B) Rad51 homolog U84138 RAD51C Rad51 homolog NM_002876RAD51L3 (RAD51D) Rad51 homolog NM_002878 DMC1 Rad51 homolog, meiosisNM_007068 XRCC2 DNA break and cross-link repair NM_005431 XRCC3 DNAbreak and cross-link repair NM_005432 RAD52 Accessory factor forrecombination NM_002879 RAD54L Accessory factor for recombinationNM_003579 RAD54B Accessory factor for recombination NM_012415 BRCA1Accessory factor for transcription NM_007295 and recombination BRCA2Cooperation with RAD51, essential NM_000059 function RAD50 ATPase incomplex with MRE11A, NBS1 NM_005732 MRE11A 3′ exonuclease NM_005590 NBS1Mutated in Nijmegen breakage syndrome NM_002485 Nonhomologousend-joining Ku70 (G22P1) DNA end binding NM_001469 Ku80 (XRCC5) DNA endbinding M30938 PRKDC DNA-dependent protein kinase NM_006904 catalyticsubunit LIG4 Nonhomologous end-joining NM_002312 XRCC4 Nonhomologousend-joining NM_003401 Sanitization of nucleotide pools MTH1 (NUDT1)8-oxoGTPase NM_002452 DUT dUTPase NM_001948 DNA polymerases (catalyticsubunits) POLB BER in nuclear DNA NM_002690 POLG BER in mitochondrialDNA NM_002693 POLD1 NER and MMR NM_002691 POLE1 NER and MMR NM_006231PCNA Sliding clamp for pol delta and pol NM_002592 epsilon REV3L (POLZ)DNA pol zeta catalytic subunit, NM_002912 essential function REV7(MAD2L2) DNA pol zeta subunit NM_006341 REV1 dCMP transferase NM_016316POLH XP variant NM_006502 POLI (RAD30B) Lesion bypass NM_007195 POLQ DNAcross-link repair NM_006596 DINB1 (POLK) Lesion bypass NM_016218 POLLMeiotic function NM_013274 POLM Presumed specialized lymphoid NM_013284function TRF4-1 Sister-chromatid cohesion AF089896 TRF4-2Sister-chromatid cohesion AF089897 Editing and processing nucleases FEN1(DNase IV) 5′ nuclease NM_004111 TREX1 (DNase III) 3′ exonucleaseNM_007248 TREX2 3′ exonuclease NM_007205 EX01 (HEX1) 5′ exonucleaseNM_003686 SPO11 endonuclease NM_012444 Rad6 pathway UBE2A (RAD6A)Ubiquitin-conjugating enzyme NM_003336 UBE2B (RAD6B)Ubiquitin-conjugating enzyme NM_003337 RAD18 Assists repair orreplication of damaged AB035274 DNA UBE2VE (MMS2) Ubiquitin-conjugatingcomplex AF049140 UBE2N (UBC13, BTG1) Ubiquitin-conjugating complexNM_003348 Genes defective in diseases associated with sensitivity to DNAdamaging agents BLM Bloom syndrome helicase NM_000057 WRN Wernersyndrome helicase/3′- NM_000553 exonuclease RECQL4 Rothmund-Thompsonsyndrome NM_004260 ATM Ataxia telangiectasia NM_000051 Fanconi anemiaFANCA Involved in tolerance or repair of DNA NM_000135 cross-links FANCBInvolved in tolerance or repair of DNA N/A cross-links FANCC Involved intolerance or repair of DNA NM_000136 cross-links FANCD Involved intolerance or repair of DNA N/A cross-links FANCE Involved in toleranceor repair of DNA NM_021922 cross-links FANCF Involved in tolerance orrepair of DNA AF181994 cross-links FANCG (XRCC9) Involved in toleranceor repair of DNA NM_004629 cross-links Other identified genes with asuspected DNA repair function SNM1 (PS02) DNA cross-link repair D42045SNM1B Related to SNM1 AL137856 SNM1C Related to SNM1 AA315885 RPA4Similar to RPA2 NM_013347 ABH (ALKB) Resistance to alkylation damageX91992 PNKP Converts some DNA breaks to ligatable NM_007254 ends Otherconserved DNA damage response genes ATR ATM- and PI-3K-like essentialkinase NM_001184 RAD1 (S. pombe) homolog PCNA-like DNA damage sensorNM_002853 RAD9 (S. pombe) homolog PCNA-like DNA damage sensor NM_004584HUS1 (S. pombe) homolog PCNA-like DNA damage sensor NM_004507 RAD17(RAD24) RFC-like DNA damage sensor NM_002873 TP53BP1 BRCT proteinNM_005657 CHEK1 Effector kinase NM_001274 CHK2 (Rad53) Effector kinaseNM_007194

TABLE 3 Gene Name Gene Title Biological Activity RFC2 replication factorC (activator 1) 2, DNA replication 40 kDa XRCC6 X-ray repaircomplementing DNA ligation /// DNA repair /// double-strand breakdefective repair in Chinese hamster repair via nonhomologous end-joining/// DNA cells 6 (Ku autoantigen, 70 kDa) recombination /// positiveregulation of transcription, DNA-dependent /// double-strand breakrepair via nonhomologous end-joining /// response to DNA damage stimulus/// DNA recombination APOBEC apolipoprotein B mRNA editing For all ofAPOBEC1, APOBEC2, APOBEC3A-H, enzyme, catalytic polypeptide-like andAPOBEC4, cytidine deaminases. POLD2 polymerase (DNA directed), delta 2,DNA replication /// DNA replication regulatory subunit 50 kDa PCNAproliferating cell nuclear antigen regulation of progression throughcell cycle /// DNA replication /// regulation of DNA replication /// DNArepair /// cell proliferation /// phosphoinositide-mediated signaling/// DNA replication RPA1 replication protein A1, 70 kDa DNA-dependentDNA replication /// DNA repair /// DNA recombination /// DNA replicationRPA1 replication protein A1, 70 kDa DNA-dependent DNA replication ///DNA repair /// DNA recombination /// DNA replication RPA2 replicationprotein A2, 32 kDa DNA replication /// DNA-dependent DNA replicationERCC3 excision repair cross-complementing DNA topological change ///transcription-coupled rodent repair deficiency, nucleotide-excisionrepair /// transcription /// complementation group 3 (xerodermaregulation of transcription, DNA-dependent /// pigmentosum group Btranscription from RNA polymerase II promoter /// complementing)induction of apoptosis /// sensory perception of sound /// DNA repair/// nucleotide-excision repair /// response to DNA damage stimulus ///DNA repair UNG uracil-DNA glycosylase carbohydrate metabolism /// DNArepair /// base-excision repair /// response to DNA damage stimulus ///DNA repair /// DNA repair ERCC5 excision repair cross-complementingtranscription-coupled nucleotide-excision repair /// rodent repairdeficiency, nucleotide-excision repair /// sensory perception ofcomplementation group 5 (xeroderma sound /// DNA repair /// response toDNA damage pigmentosum, complementation stimulus /// nucleotide-excisionrepair group G (Cockayne syndrome)) MLH1 mutL homolog 1, colon cancer,mismatch repair /// cell cycle /// negative regulation nonpolyposis type2 (E. coli) of progression through cell cycle /// DNA repair ///mismatch repair /// response to DNA damage stimulus LIG1 ligase I, DNA,ATP-dependent DNA replication /// DNA repair /// DNA recombination ///cell cycle /// morphogenesis /// cell division /// DNA repair ///response to DNA damage stimulus /// DNA metabolism NBN nibrin DNA damagecheckpoint /// cell cycle checkpoint /// double-strand break repair NBNnibrin DNA damage checkpoint /// cell cycle checkpoint /// double-strandbreak repair NBN nibrin DNA damage checkpoint /// cell cycle checkpoint/// double-strand break repair MSH6 mutS homolog 6 (E. coli) mismatchrepair /// DNA metabolism /// DNA repair /// mismatch repair ///response to DNA damage stimulus POLD4 polymerase (DNA-directed), delta 4DNA replication /// DNA replication RFC5 replication factor C(activator 1) 5, DNA replication /// DNA repair /// DNA replication 36.5kDa RFC5 replication factor C (activator 1) 5, DNA replication /// DNArepair /// DNA replication 36.5 kDa DDB2 /// damage-specific DNA bindingnucleotide-excision repair /// regulation of LHX3 protein 2, 48 kDa ///LIM homeobox 3 transcription, DNA-dependent /// organ morphogenesis ///DNA repair /// response to DNA damage stimulus /// DNA repair ///transcription /// regulation of transcription POLD1 polymerase (DNAdirected), delta 1, DNA replication /// DNA repair /// response to UV/// catalytic subunit 125 kDa DNA replication FANCG Fanconi anemia,complementation cell cycle checkpoint /// DNA repair /// DNA repair ///group G response to DNA damage stimulus /// regulation of progressionthrough cell cycle POLB polymerase (DNA directed), beta DNA-dependentDNA replication /// DNA repair /// DNA replication /// DNA repair ///response to DNA damage stimulus XRCC1 X-ray repair complementing singlestrand break repair defective repair in Chinese hamster cells 1 MPGN-methylpurine-DNA glycosylase base-excision repair /// DNA dealkylation/// DNA repair /// base-excision repair /// response to DNA damagestimulus RFC2 replication factor C (activator 1) 2, DNA replication 40kDa ERCC1 excision repair cross-complementing nucleotide-excision repair/// morphogenesis /// rodent repair deficiency, nucleotide-excisionrepair /// DNA repair /// complementation group 1 (includes response toDNA damage stimulus overlapping antisense sequence) TDG thymine-DNAglycosylase carbohydrate metabolism /// base-excision repair /// DNArepair /// response to DNA damage stimulus TDG thymine-DNA glycosylasecarbohydrate metabolism /// base-excision repair /// DNA repair ///response to DNA damage stimulus FANCA Fanconi anemia, complementationDNA repair /// protein complex assembly /// DNA group A /// Fanconianemia, repair /// response to DNA damage stimulus complementation groupA RFC4 replication factor C (activator 1) 4, DNA replication /// DNAstrand elongation /// DNA 37 kDa repair /// phosphoinositide-mediatedsignaling /// DNA replication RFC3 replication factor C (activator 1) 3,DNA replication /// DNA strand elongation 38 kDa RFC3 replication factorC (activator 1) 3, DNA replication /// DNA strand elongation 38 kDaAPEX2 APEX nuclease DNA repair /// response to DNA damage stimulus(apurinic/apyrimidinic endonuclease) 2 RAD1 RAD1 homolog (S. pombe) DNArepair /// cell cycle checkpoint /// cell cycle checkpoint /// DNAdamage checkpoint /// DNA repair /// response to DNA damage stimulus ///meiotic prophase I RAD1 RAD1 homolog (S. pombe) DNA repair /// cellcycle checkpoint /// cell cycle checkpoint /// DNA damage checkpoint ///DNA repair /// response to DNA damage stimulus /// meiotic prophase IBRCA1 breast cancer 1, early onset regulation of transcription from RNApolymerase II promoter /// regulation of transcription from RNApolymerase III promoter /// DNA damage response, signal transduction byp53 class mediator resulting in transcription of p21 class mediator ///cell cycle /// protein ubiquitination /// androgen receptor signalingpathway /// regulation of cell proliferation /// regulation of apoptosis/// positive regulation of DNA repair /// negative regulation ofprogression through cell cycle /// positive regulation of transcription,DNA-dependent /// negative regulation of centriole replication /// DNAdamage response, signal transduction resulting in induction of apoptosis/// DNA repair /// response to DNA damage stimulus /// proteinubiquitination /// DNA repair /// regulation of DNA repair /// apoptosis/// response to DNA damage stimulus EXO1 exonuclease 1 DNA repair ///DNA repair /// mismatch repair /// DNA recombination FEN1 flapstructure-specific endonuclease 1 DNA replication /// double-strandbreak repair /// UV protection /// phosphoinositide-mediated signaling/// DNA repair /// DNA replication /// DNA repair /// DNA repair FEN1flap structure-specific endonuclease 1 DNA replication /// double-strandbreak repair /// UV protection /// phosphoinositide-mediated signaling/// DNA repair /// DNA replication /// DNA repair /// DNA repair MLH3mutL homolog 3 (E. coli) mismatch repair /// meiotic recombination ///DNA repair /// mismatch repair /// response to DNA damage stimulus ///mismatch repair MGMT O-6-methylguanine-DNA DNA ligation /// DNA repair/// response to DNA methyltransferase damage stimulus RAD51 RAD51homolog (RecA homolog, double-strand break repair via homologous E.coli) (S. cerevisiae) recombination /// DNA unwinding during replication/// DNA repair /// mitotic recombination /// meiosis /// meioticrecombination /// positive regulation of DNA ligation /// proteinhomooligomerization /// response to DNA damage stimulus /// DNAmetabolism /// DNA repair /// response to DNA damage stimulus /// DNArepair /// DNA recombination /// meiotic recombination /// double-strandbreak repair via homologous recombination /// DNA unwinding duringreplication RAD51 RAD51 homolog (RecA homolog, double-strand breakrepair via homologous E. coli) (S. cerevisiae) recombination /// DNAunwinding during replication /// DNA repair /// mitotic recombination/// meiosis /// meiotic recombination /// positive regulation of DNAligation /// protein homooligomerization /// response to DNA damagestimulus /// DNA metabolism /// DNA repair /// response to DNA damagestimulus /// DNA repair /// DNA recombination /// meiotic recombination/// double-strand break repair via homologous recombination /// DNAunwinding during replication XRCC4 X-ray repair complementing DNA repair/// double-strand break repair /// DNA defective repair in Chinesehamster recombination /// DNA recombination /// response cells 4 to DNAdamage stimulus XRCC4 X-ray repair complementing DNA repair ///double-strand break repair /// DNA defective repair in Chinese hamsterrecombination /// DNA recombination /// response cells 4 to DNA damagestimulus RECQL RecQ protein-like (DNA helicase DNA repair /// DNAmetabolism Q1-like) ERCC8 excision repair cross-complementing DNA repair/// transcription /// regulation of rodent repair deficiency,transcription, DNA-dependent /// sensory perception complementationgroup 8 of sound /// transcription-coupled nucleotide- excision repairFANCC Fanconi anemia, complementation DNA repair /// DNA repair ///protein complex group C assembly /// response to DNA damage stimulusOGG1 8-oxoguanine DNA glycosylase carbohydrate metabolism ///base-excision repair /// DNA repair /// base-excision repair ///response to DNA damage stimulus /// DNA repair MRE11A MRE11 meioticrecombination 11 regulation of mitotic recombination /// double- homologA (S. cerevisiae) strand break repair via nonhomologous end-joining ///telomerase-dependent telomere maintenance /// meiosis /// meioticrecombination /// DNA metabolism /// DNA repair /// double-strand breakrepair /// response to DNA damage stimulus /// DNA repair ///double-strand break repair /// DNA recombination RAD52 RAD52 homolog (S.cerevisiae) double-strand break repair /// mitotic recombination ///meiotic recombination /// DNA repair /// DNA recombination /// responseto DNA damage stimulus WRN Werner syndrome DNA metabolism /// aging XPAxeroderma pigmentosum, nucleotide-excision repair /// DNA repair ///complementation group A response to DNA damage stimulus /// DNA repair/// nucleotide-excision repair BLM Bloom syndrome DNA replication ///DNA repair /// DNA recombination /// antimicrobial humoral response(sensu Vertebrata) /// DNA metabolism /// DNA replication OGG18-oxoguanine DNA glycosylase carbohydrate metabolism /// base-excisionrepair /// DNA repair /// base-excision repair /// response to DNAdamage stimulus /// DNA repair MSH3 mutS homolog 3 (E. coli) mismatchrepair /// DNA metabolism /// DNA repair /// mismatch repair ///response to DNA damage stimulus POLE2 polymerase (DNA directed), epsilonDNA replication /// DNA repair /// DNA replication 2 (p59 subunit)RAD51C RAD51 homolog C (S. cerevisiae) DNA repair /// DNA recombination/// DNA metabolism /// DNA repair /// DNA recombination /// response toDNA damage stimulus LIG4 ligase IV, DNA, ATP-dependent single strandbreak repair /// DNA replication /// DNA recombination /// cell cycle/// cell division /// DNA repair /// response to DNA damage stimulusERCC6 excision repair cross-complementing DNA repair /// transcription/// regulation of rodent repair deficiency, transcription, DNA-dependent/// transcription from complementation group 6 RNA polymerase IIpromoter /// sensory perception of sound LIG3 ligase III, DNA,ATP-dependent DNA replication /// DNA repair /// cell cycle /// meioticrecombination /// spermatogenesis /// cell division /// DNA repair ///DNA recombination /// response to DNA damage stimulus RAD17 RAD17homolog (S. pombe) DNA replication /// DNA repair /// cell cycle ///response to DNA damage stimulus XRCC2 X-ray repair complementing DNArepair /// DNA recombination /// meiosis /// defective repair in Chinesehamster DNA metabolism /// DNA repair /// response to cells 2 DNA damagestimulus MUTYH mutY homolog (E. coli) carbohydrate metabolism ///base-excision repair /// mismatch repair /// cell cycle /// negativeregulation of progression through cell cycle /// DNA repair /// responseto DNA damage stimulus /// DNA repair RFC1 replication factor C(activator 1) 1, DNA-dependent DNA replication /// transcription /// 145kDa /// replication factor C regulation of transcription, DNA-dependent/// (activator 1) 1, 145 kDa telomerase-dependent telomere maintenance/// DNA replication /// DNA repair RFC1 replication factor C(activator 1) 1, DNA-dependent DNA replication /// transcription /// 145kDa regulation of transcription, DNA-dependent /// telomerase-dependenttelomere maintenance /// DNA replication /// DNA repair BRCA2 breastcancer 2, early onset regulation of progression through cell cycle ///double-strand break repair via homologous recombination /// DNA repair/// establishment and/or maintenance of chromatin architecture ///chromatin remodeling /// regulation of S phase of mitotic cell cycle ///mitotic checkpoint /// regulation of transcription /// response to DNAdamage stimulus RAD50 RAD50 homolog (S. cerevisiae) regulation ofmitotic recombination /// double- strand break repair ///telomerase-dependent telomere maintenance /// cell cycle /// meiosis ///meiotic recombination /// chromosome organization and biogenesis ///telomere maintenance /// DNA repair /// response to DNA damage stimulus/// DNA repair /// DNA recombination DDB1 damage-specific DNA bindingnucleotide-excision repair /// ubiquitin cycle /// protein 1, 127 kDaDNA repair /// response to DNA damage stimulus /// DNA repair XRCC5X-ray repair complementing double-strand break repair via nonhomologousend- defective repair in Chinese hamster joining /// DNA recombination/// DNA repair /// cells 5 (double-strand-break DNA recombination ///response to DNA damage rejoining; Ku autoantigen, 80 kDa) stimulus ///double-strand break repair XRCC5 X-ray repair complementingdouble-strand break repair via nonhomologous end- defective repair inChinese hamster joining /// DNA recombination /// DNA repair /// cells 5(double-strand-break DNA recombination /// response to DNA damagerejoining; Ku autoantigen, 80 kDa) stimulus /// double-strand breakrepair PARP1 poly (ADP-ribose) polymerase DNA repair /// transcriptionfrom RNA polymerase family, member 1 II promoter /// protein amino acidADP-ribosylation /// DNA metabolism /// DNA repair /// protein aminoacid ADP-ribosylation /// response to DNA damage stimulus POLE3polymerase (DNA directed), epsilon DNA replication 3 (p17 subunit) RFC1replication factor C (activator 1) 1, DNA-dependent DNA replication ///transcription /// 145 kDa regulation of transcription, DNA-dependent ///telomerase-dependent telomere maintenance /// DNA replication /// DNArepair RAD50 RAD50 homolog (S. cerevisiae) regulation of mitoticrecombination /// double- strand break repair /// telomerase-dependenttelomere maintenance /// cell cycle /// meiosis /// meioticrecombination /// chromosome organization and biogenesis /// telomeremaintenance /// DNA repair /// response to DNA damage stimulus /// DNArepair /// DNA recombination XPC xeroderma pigmentosum,nucleotide-excision repair /// DNA repair /// complementation group Cnucleotide-excision repair /// response to DNA damage stimulus /// DNArepair MSH2 mutS homolog 2, colon cancer, mismatch repair ///postreplication repair /// cell nonpolyposis type 1 (E. coli) cycle ///negative regulation of progression through cell cycle /// DNA metabolism/// DNA repair /// mismatch repair /// response to DNA damage stimulus/// DNA repair RPA3 replication protein A3, 14 kDa DNA replication ///DNA repair /// DNA replication MBD4 methyl-CpG binding domain protein 4base-excision repair /// DNA repair /// response to DNA damage stimulus/// DNA repair MBD4 methyl-CpG binding domain protein 4 base-excisionrepair /// DNA repair /// response to DNA damage stimulus /// DNA repairNTHL1 nth endonuclease III-like 1 (E. coli) carbohydrate metabolism ///base-excision repair /// nucleotide-excision repair, DNA incision, 5′-tolesion /// DNA repair /// response to DNA damage stimulus PMS2 /// PMS2postmeiotic segregation mismatch repair /// cell cycle /// negativeregulation PMS2CL increased 2 (S. cerevisiae) /// of progression throughcell cycle /// DNA repair /// PMS2-C terminal-like mismatch repair ///response to DNA damage stimulus /// mismatch repair RAD51C RAD51 homologC (S. cerevisiae) DNA repair /// DNA recombination /// DNA metabolism/// DNA repair /// DNA recombination /// response to DNA damage stimulusUNG2 uracil-DNA glycosylase 2 regulation of progression through cellcycle /// carbohydrate metabolism /// base-excision repair /// DNArepair /// response to DNA damage stimulus APEX1 APEX nuclease(multifunctional base-excision repair /// transcription from RNA DNArepair enzyme) 1 polymerase II promoter /// regulation of DNA binding/// DNA repair /// response to DNA damage stimulus ERCC4 excision repaircross-complementing nucleotide-excision repair /// nucleotide-excisionrodent repair deficiency, repair /// DNA metabolism /// DNA repair ///complementation group 4 response to DNA damage stimulus RAD1 RAD1homolog (S. pombe) DNA repair /// cell cycle checkpoint /// cell cyclecheckpoint /// DNA damage checkpoint /// DNA repair /// response to DNAdamage stimulus /// meiotic prophase I RECQL5 RecQ protein-like 5 DNArepair /// DNA metabolism /// DNA metabolism MSH5 mutS homolog 5 (E.coli) DNA metabolism /// mismatch repair /// mismatch repair /// meiosis/// meiotic recombination /// meiotic prophase II /// meiosis RECQL RecQprotein-like (DNA helicase DNA repair /// DNA metabolism Q1-like) RAD52RAD52 homolog (S. cerevisiae) double-strand break repair /// mitoticrecombination /// meiotic recombination /// DNA repair /// DNArecombination /// response to DNA damage stimulus XRCC4 X-ray repaircomplementing DNA repair /// double-strand break repair /// DNAdefective repair in Chinese hamster recombination /// DNA recombination/// response cells 4 to DNA damage stimulus XRCC4 X-ray repaircomplementing DNA repair /// double-strand break repair /// DNAdefective repair in Chinese hamster recombination /// DNA recombination/// response cells 4 to DNA damage stimulus RAD17 RAD17 homolog (S.pombe) DNA replication /// DNA repair /// cell cycle /// response to DNAdamage stimulus MSH3 mutS homolog 3 (E. coli) mismatch repair /// DNAmetabolism /// DNA repair /// mismatch repair /// response to DNA damagestimulus MRE11A MRE11 meiotic recombination 11 regulation of mitoticrecombination /// double- homolog A (S. cerevisiae) strand break repairvia nonhomologous end-joining /// telomerase-dependent telomeremaintenance /// meiosis /// meiotic recombination /// DNA metabolism ///DNA repair /// double-strand break repair /// response to DNA damagestimulus /// DNA repair /// double-strand break repair /// DNArecombination MSH6 mutS homolog 6 (E. coli) mismatch repair /// DNAmetabolism /// DNA repair /// mismatch repair /// response to DNA damagestimulus MSH6 mutS homolog 6 (E. coli) mismatch repair /// DNAmetabolism /// DNA repair /// mismatch repair /// response to DNA damagestimulus RECQL5 RecQ protein-like 5 DNA repair /// DNA metabolism ///DNA metabolism BRCA1 breast cancer 1, early onset regulation oftranscription from RNA polymerase II promoter /// regulation oftranscription from RNA polymerase III promoter /// DNA damage response,signal transduction by p53 class mediator resulting in transcription ofp21 class mediator /// cell cycle /// protein ubiquitination ///androgen receptor signaling pathway /// regulation of cell proliferation/// regulation of apoptosis /// positive regulation of DNA repair ///negative regulation of progression through cell cycle /// positiveregulation of transcription, DNA-dependent /// negative regulation ofcentriole replication /// DNA damage response, signal transductionresulting in induction of apoptosis /// DNA repair /// response to DNAdamage stimulus /// protein ubiquitination /// DNA repair /// regulationof DNA repair /// apoptosis /// response to DNA damage stimulus RAD52RAD52 homolog (S. cerevisiae) double-strand break repair /// mitoticrecombination /// meiotic recombination /// DNA repair /// DNArecombination /// response to DNA damage stimulus POLD3 polymerase(DNA-directed), delta 3, DNA synthesis during DNA repair /// mismatchaccessory subunit repair /// DNA replication MSH5 mutS homolog 5 (E.coli) DNA metabolism /// mismatch repair /// mismatch repair /// meiosis/// meiotic recombination /// meiotic prophase II /// meiosis ERCC2excision repair cross-complementing transcription-couplednucleotide-excision repair /// rodent repair deficiency, transcription/// regulation of transcription, DNA- complementation group 2 (xerodermadependent /// transcription from RNA polymerase II pigmentosum D)promoter /// induction of apoptosis /// sensory perception of sound ///nucleobase, nucleoside, nucleotide and nucleic acid metabolism ///nucleotide-excision repair RECQL4 RecQ protein-like 4 DNA repair ///development /// DNA metabolism PMS1 PMS1 postmeiotic segregationmismatch repair /// regulation of transcription, increased 1 (S.cerevisiae) DNA-dependent /// cell cycle /// negative regulation ofprogression through cell cycle /// mismatch repair /// DNA repair ///response to DNA damage stimulus ZFP276 zinc finger protein 276 homologtranscription /// regulation of transcription, (mouse) DNA-dependentMBD4 methyl-CpG binding domain protein 4 base-excision repair /// DNArepair /// response to DNA damage stimulus /// DNA repair MBD4methyl-CpG binding domain protein 4 base-excision repair /// DNA repair/// response to DNA damage stimulus /// DNA repair MLH3 mutL homolog 3(E. coli) mismatch repair /// meiotic recombination /// DNA repair ///mismatch repair /// response to DNA damage stimulus /// mismatch repairFANCA Fanconi anemia, complementation DNA repair /// protein complexassembly /// DNA group A repair /// response to DNA damage stimulus POLEpolymerase (DNA directed), epsilon DNA replication /// DNA repair ///DNA replication /// response to DNA damage stimulus XRCC3 X-ray repaircomplementing DNA repair /// DNA recombination /// DNA defective repairin Chinese hamster metabolism /// DNA repair /// DNA recombination ///cells 3 response to DNA damage stimulus /// response to DNA damagestimulus MLH3 mutL homolog 3 (E. coli) mismatch repair /// meioticrecombination /// DNA repair /// mismatch repair /// response to DNAdamage stimulus /// mismatch repair NBN nibrin DNA damage checkpoint ///cell cycle checkpoint /// double-strand break repair SMUG1 single-strandselective carbohydrate metabolism /// DNA repair /// responsemonofunctional uracil DNA to DNA damage stimulus glycosylase FANCFFanconi anemia, complementation DNA repair /// response to DNA damagestimulus group F NEIL1 nei endonuclease VIII-like 1 (E. coli)carbohydrate metabolism /// DNA repair /// response to DNA damagestimulus FANCE Fanconi anemia, complementation DNA repair /// responseto DNA damage stimulus group E MSH5 mutS homolog 5 (E. coli) DNAmetabolism /// mismatch repair /// mismatch repair /// meiosis ///meiotic recombination /// meiotic prophase II /// meiosis RECQL5 RecQprotein-like 5 DNA repair /// DNA metabolism /// DNA metabolism

For still another example, some cell free DNAs/mRNAs are fragments of orthose encoding a full length or a fragment of a gene not associated witha disease (e.g., housekeeping genes), including, but not limited to,those related to transcription factors (e.g., ATF1, ATF2, ATF4, ATF6,ATF7, ATFIP, BTF3, E2F4, ERH, HMGB1, ILF2, IER2, JUND, TCEB2, etc.),repressors (e.g., PUF60), RNA splicing (e.g., BAT1, HNRPD, HNRPK,PABPN1, SRSF3, etc.), translation factors (EIF1, EIF1AD, EIF1B, EIF2A,EIF2AK1, EIF2AK3, EIF2AK4, EIF2B2, EIF2B3, EIF2B4, EIF2S2, EIF3A, etc.),tRNA synthetases (e.g., AARS, CARS, DARS, FARS, GARS, HARS, IARS, KARS,MARS, etc.), RNA binding protein (e.g., ELAVL1, etc.), ribosomalproteins (e.g., RPL5, RPL8, RPL9, RPL10, RPL11, RPL14, RPL25, etc.),mitochondrial ribosomal proteins (e.g., MRPL9, MRPL1, MRPL10, MRPL11,MRPL12, MRPL13, MRPL14, etc.), RNA polymerase (e.g., POLR1C, POLR1D,POLR1E, POLR2A, POLR2B, POLR2C, POLR2D, POLR3C, etc.), proteinprocessing (e.g., PPID, PPI3, PPIF, CANX, CAPN1, NACA, PFDN2, SNX2,SS41, SUMO1, etc.), heat shock proteins (e.g., HSPA4, HSPA5, HSBP1,etc.), histone (e.g., HIST1HSBC, H1FX, etc.), cell cycle (e.g.,ARHGAP35, RAB10, RAB11A, CCNY, CCNL, PPP1CA, RAD1, RAD17, etc.),carbohydrate metabolism (e.g., ALDOA, GSK3A, PGK1, PGAM5, etc.), lipidmetabolism (e.g., HADHA), citric acid cycle (e.g., SDHA, SDHB, etc.),amino acid metabolism (e.g., COMT, etc.), NADH dehydrogenase (e.g.,NDUFA2, etc.), cytochrome c oxidase (e.g., COX5B, COX8, COX11, etc.),ATPase (e.g. ATP2C1, ATP5F1, etc.), lysosome (e.g., CTSD, CSTB, LAMP1,etc.), proteasome (e.g., PSMA1, UBA1, etc.), cytoskeletal proteins(e.g., ANXA6, ARPC2, etc.), and organelle synthesis (e.g., BLOC1S1,AP2A1, etc.).

In still another example, some cell free DNAs/mRNAs are fragments of orthose encoding a full length or a fragment of a neoepitope specific tothe tumor. With respect to neoepitope, it should be appreciated thatneoepitopes can be characterized as random mutations in tumor cells thatcreate unique and tumor specific antigens. Therefore, high-throughputgenome sequencing should allow for rapid and specific identification ofpatient specific neoepitopes where the analysis also considers matchednormal tissue of the same patient. In some embodiments, neoepitopes maybe identified from a patient tumor in a first step by whole genomeanalysis of a tumor biopsy (or lymph biopsy or biopsy of a metastaticsite) and matched normal tissue (i.e., non-diseased tissue from the samepatient) via synchronous comparison of the so obtained omicsinformation. While not limiting to the inventive subject matter, it istypically preferred that the data are patient matched tumor data (e.g.,tumor versus same patient normal), and that the data format is in SAM,BAM, GAR, or VCF format. However, non-matched or matched versus otherreference (e.g., prior same patient normal or prior same patient tumor,or Homo statisticus) are also deemed suitable for use herein. Therefore,the omics data may be ‘fresh’ omics data or omics data that wereobtained from a prior procedure (or even different patient). However,and especially where genomics ctDNA is analyzed, the neoepitope-codingsequence need not necessarily be expressed.

In particularly preferred aspects, the nucleic acid encoding aneoepitope may encode a neoepitope that is also a suitable target forimmune therapy. Therefore, neoepitopes can then be further filtered fora match to the patient's HLA type to thereby increase likelihood ofantigen presentation of the neoepitope. Most preferably, and as furtherdiscussed below, such matching can be done in silico. Most typically,the patient-specific epitopes are unique to the patient, but may also inat least some cases include tumor type-specific neoepitopes (e.g.,Her-2, PSA, brachyury) or cancer-associated neoepitopes (e.g., CEA,MUC-1, CYPB1).

It is contemplated that cell free DNA/mRNA may present in modified formsor different isoforms. For example, the cell free DNA may be present inmethylated or hydroxyl methylated, and the methylation level of somegenes (e.g., GSTP1, p16, APC, etc.) may be a hallmark of specific typesof cancer (e.g., colorectal cancer, etc.). The cell free mRNA may bepresent in a plurality of isoforms (e.g., splicing variants, etc.) thatmay be associated with different cell types and/or location. Preferably,different isoforms of mRNA may be a hallmark of specific tissues (e.g.,brain, intestine, adipose tissue, muscle, etc.), or may be a hallmark ofcancer (e.g., different isoform is present in the cancer cell comparedto corresponding normal cell, or the ratio of different isoforms isdifferent in the cancer cell compared to corresponding normal cell,etc.). For example, mRNA encoding HMGB1 are present in 18 differentalternative splicing variants and 2 unspliced forms. Those isoforms areexpected to express in different tissues/locations of the patient's body(e.g., isoform A is specific to prostate, isoform B is specific tobrain, isoform C is specific to spleen, etc.). Thus, in theseembodiments, identifying the isoforms of cell free mRNA in the patient'sbodily fluid can provide information on the origin (e.g., cell type,tissue type, etc.) of the cell free mRNA.

The inventors contemplate that the quantities and/or isoforms (orsubtypes) or regulatory noncoding RNA (e.g., microRNA, small interferingRNA, long non-coding RNA (lncRNA)) can vary and fluctuate by presence ofa tumor or immune response against the tumor. Without wishing to bebound by any specific theory, varied expression of regulatory noncodingRNA in a cancer patient's bodily fluid may due to genetic modificationof the cancer cell (e.g., deletion, translocation of parts of achromosome, etc.), and/or inflammations at the cancer tissue by immunesystem (e.g., regulation of miR-29 family by activation of interferonsignaling and/or virus infection, etc.). Thus, in some embodiments, thecell free RNA can be a regulatory noncoding RNA that modulatesexpression (e.g., downregulates, silences, etc.) of mRNA encoding acancer-related protein or an inflammation-related protein (e.g., HMGB1,HMGB2, HMGB3, MUC1, VWF, MMP, CRP, PBEF1, TNF-α, TGF-β, PDGFA, IL-1,IL-2, IL-3, IL-4, IL-5, IL-6, IL-7, IL-8, IL-9, IL-10, IL-12, IL-13,IL-15, IL-17, Eotaxin, FGF, G-CSF, GM-CSF, IFN-7, IP-10, MCP-1, PDGF,hTERT, etc.).

It is also contemplated that some cell free regulatory noncoding RNA maybe present in a plurality of isoforms or members (e.g., members ofmiR-29 family, etc.) that may be associated with different cell typesand/or location. Preferably, different isoforms or members of regulatorynoncoding RNA may be a hallmark of specific tissues (e.g., brain,intestine, adipose tissue, muscle, etc.), or may be a hallmark of cancer(e.g., different isoform is present in the cancer cell compared tocorresponding normal cell, or the ratio of different isoforms isdifferent in the cancer cell compared to corresponding normal cell,etc.). For example, higher expression level of miR-155 in the bodilyfluid can be associated with the presence of breast tumor, and thereduced expression level of miR-155 can be associated with reduced sizeof breast tumor. Thus, in these embodiments, identifying the isoforms ofcell free regulatory noncoding RNA in the patient's bodily fluid canprovide information on the origin (e.g., cell type, tissue type, etc.)of the cell free regulatory noncoding RNA.

Isolation and Amplification of Cell Free DNA/RNA

Any suitable methods to isolate and amplify cell free DNA/RNA arecontemplated. Most typically, cell free DNA/RNA is isolated from abodily fluid (e.g., whole blood) that is processed under a suitableconditions, including a condition that stabilizes cell free RNA.Preferably, both cell free DNA and RNA are isolated simultaneously fromthe same badge of the patient's bodily fluid. Yet, it is alsocontemplated that the bodily fluid sample can be divided into two ormore smaller samples from which DNA or RNA can be isolated separately.Once separated from the non-nucleic acid components, cell free RNA arethen quantified, preferably using real time, quantitative PCR or realtime, quantitative RT-PCR.

The bodily fluid of the patient can be obtained at any desired timepoint(s) depending on the purpose of the omics analysis. For example,the bodily fluid of the patient can be obtained before and/or after thepatient is confirmed to have a tumor and/or periodically thereafter(e.g., every week, every month, etc.) in order to associate the cellfree DNA/RNA data with the prognosis of the cancer. In some embodiments,the bodily fluid of the patient can be obtained from a patient beforeand after the cancer treatment (e.g., chemotherapy, radiotherapy, drugtreatment, cancer immunotherapy, etc.). While it may vary depending onthe type of treatments and/or the type of cancer, the bodily fluid ofthe patient can be obtained at least 24 hours, at least 3 days, at least7 days after the cancer treatment. For more accurate comparison, thebodily fluid from the patient before the cancer treatment can beobtained less than 1 hour, less than 6 hours before, less than 24 hoursbefore, less than a week before the beginning of the cancer treatment.In addition, a plurality of samples of the bodily fluid of the patientcan be obtained during a period before and/or after the cancer treatment(e.g., once a day after 24 hours for 7 days, etc.).

Additionally or alternatively, the bodily fluid of a healthy individualcan be obtained to compare the sequence/modification of cell free DNA,and/or quantity/subtype expression of cell free RNA. As used herein, ahealthy individual refers an individual without a tumor. Preferably, thehealthy individual can be chosen among group of people sharescharacteristics with the patient (e.g., age, gender, ethnicity, diet,living environment, family history, etc.).

Any suitable methods for isolating cell free DNA/RNA are contemplated.For example, in one exemplary method of DNA isolation, specimens wereaccepted as 10 ml of whole blood drawn into a test tube. Cell free DNAcan be isolated from other from mono-nucleosomal and di-nucleosomalcomplexes using magnetic beads that can separate out cell free DNA at asize between 100-300 bps. For another example, in one exemplary methodof RNA isolation, specimens were accepted as 10 ml of whole blood drawninto cell-free RNA BCT® tubes or cell-free DNA BCT® tubes containing RNAstabilizers, respectively. Advantageously, cell free RNA is stable inwhole blood in the cell-free RNA BCT tubes for seven days while cellfree RNA is stable in whole blood in the cell-free DNA BCT Tubes forfourteen days, allowing time for shipping of patient samples fromworld-wide locations without the degradation of cell free RNA. Moreover,it is generally preferred that the cell free RNA is isolated using RNAstabilization agents that will not or substantially not (e.g., equal orless than 1%, or equal or less than 0.1%, or equal or less than 0.01%,or equal or less than 0.001%) lyse blood cells. Viewed from a differentperspective, the RNA stabilization reagents will not lead to asubstantial increase (e.g., increase in total RNA no more than 10%, orno more than 5%, or no more than 2%, or no more than 1%) in RNAquantities in serum or plasma after the reagents are combined withblood. Likewise, these reagents will also preserve physical integrity ofthe cells in the blood to reduce or even eliminate release of cellularRNA found in blood cell. Such preservation may be in form of collectedblood that may or may not have been separated. In less preferredaspects, contemplated reagents will stabilize cell free RNA in acollected tissue other than blood for at 2 days, more preferably atleast 5 days, and most preferably at least 7 days. Of course, it shouldbe recognized that numerous other collection modalities are also deemedappropriate, and that the cell free RNA can be at least partiallypurified or adsorbed to a solid phase to so increase stability prior tofurther processing.

As will be readily appreciated, fractionation of plasma and extractionof cell free DNA/RNA can be done in numerous manners. In one exemplarypreferred aspect, whole blood in 10 mL tubes is centrifuged tofractionate plasma at 1600 rcf for 20 minutes. The so obtained plasma isthen separated and centrifuged at 16,000 rcf for 10 minutes to removecell debris. Of course, various alternative centrifugal protocols arealso deemed suitable so long as the centrifugation will not lead tosubstantial cell lysis (e.g., lysis of no more than 1%, or no more than0.1%, or no more than 0.01%, or no more than 0.001% of all cells). Cellfree RNA is extracted from 2 mL of plasma using Qiagen reagents. Theextraction protocol was designed to remove potential contaminating bloodcells, other impurities, and maintain stability of the nucleic acidsduring the extraction. All nucleic acids were kept in bar-coded matrixstorage tubes, with DNA stored at −4° C. and RNA stored at −80° C. orreverse-transcribed to cDNA that is then stored at −4° C. Notably, soisolated cell free RNA can be frozen prior to further processing.

Omics Data Processing

Once cell free DNA/RNA is isolated, various types of omics data can beobtained using any suitable methods. DNA sequence data will not onlyinclude the presence or absence of a gene that is associated with canceror inflammation, but also take into account mutation data where the geneis mutated, the copy number (e.g., to identify duplication, loss ofallele or heterozygosity), and epigenetic status (e.g., methylation,histone phosphorylation, nucleosome positioning, etc.). With respect toRNA sequence data it should be noted that contemplated RNA sequence datainclude mRNA sequence data, splice variant data, polyadenylationinformation, etc. Moreover, it is generally preferred that the RNAsequence data also include a metric for the transcription strength(e.g., number of transcripts of a damage repair gene per million totaltranscripts, number of transcripts of a damage repair gene per totalnumber of transcripts for all damage repair genes, number of transcriptsof a damage repair gene per number of transcripts for actin or otherhousehold gene RNA, etc.), and for the transcript stability (e.g., alength of poly A tail, etc.).

With respect to the transcription strength (expression level),transcription strength of the cell free RNA can be examined byquantifying the cell free RNA. Quantification of cell free RNA can beperformed in numerous manners, however, expression of analytes ispreferably measured by quantitative real-time RT-PCR of cell free RNAusing primers specific for each gene. For example, amplification can beperformed using an assay in a 10 μL reaction mix containing 2 μL cellfree RNA, primers, and probe. mRNA of α-actin can be used as an internalcontrol for the input level of cell free RNA. A standard curve ofsamples with known concentrations of each analyte was included in eachPCR plate as well as positive and negative controls for each gene. Testsamples were identified by scanning the 2D barcode on the matrix tubescontaining the nucleic acids. Delta Ct (dCT) was calculated from the Ctvalue derived from quantitative PCR (qPCR) amplification for eachanalyte subtracted by the Ct value of actin for each individualpatient's blood sample. Relative expression of patient specimens iscalculated using a standard curve of delta Cts of serial dilutions ofUniversal Human Reference RNA set at a gene expression value of 10 (whenthe delta CTs were plotted against the log concentration of eachanalyte).

Alternatively, where discovery or scanning for new mutations or changesin expression of a particular gene is desired, real time quantitativePCR may be replaced by RNAseq to so cover at least part of a patienttranscriptome. Moreover, it should be appreciated that analysis can beperformed static or over a time course with repeated sampling to obtaina dynamic picture without the need for biopsy of the tumor or ametastasis.

Thus, omics data of cell free DNA/RNA preferably comprise a genomic dataset that includes genomic sequence information. Most typically, thegenomic sequence information comprises DNA sequence information of cellfree DNA of the patient and optionally cell free DNA of a healthyindividual. The sequence data sets may include unprocessed or processeddata sets, and exemplary data sets include those having BAM format, SAMformat, FASTQ format, or FASTA format. However, it is especiallypreferred that the data sets are provided in BAM format or as BAMBAMdiff objects (see e.g., US2012/0059670A1 and US2012/0066001A1).Moreover, it should be noted that the data sets are reflective of thecell free DNA/RNA of the patient and of the healthy individual to soobtain patient and tumor specific information. Thus, genetic germ linealterations not giving rise to the diseased cells (e.g., silentmutation, SNP, etc.) can be excluded. Further, so obtained omicsinformation can then be processed using pathway analysis (especiallyusing PARADIGM) to identify any impact of any mutations on DNA repairpathways.

Likewise, computational analysis of the sequence data may be performedin numerous manners. In most preferred methods, however, analysis isperformed in silico by location-guided synchronous alignment of cellfree DNA/RNA of the patient and a healthy individual as, for example,disclosed in US 2012/0059670A1 and US 2012/0066001A1 using BAM files andBAM servers. Such analysis advantageously reduces false positive dataand significantly reduces demands on memory and computational resources.

With respect to the analysis of cell free DNA/RNA of the patient and ahealthy individual, numerous manners are deemed suitable for use hereinso long as such methods will be able to generate a differential sequenceobject. However, it is especially preferred that the differentialsequence object is generated by incremental synchronous alignment of BAMfiles representing genomic sequence information of the cell free DNA/RNAof the patient and a healthy individual. For example, particularlypreferred methods include BAMBAM-based methods as described in US2012/0059670 and US 2012/0066001.

Omics Data Analysis: Calculation of a Score

For calculation of a score, it should be appreciated that all data fromct/cf nucleic acids are deemed suitable for use herein and may thereforebe specific to a particular tumor and/or patient and/or specific to acancer. Furthermore, such data may be further normalized or otherwisepreprocessed to adjust for age, treatment, gender, stage of disease,etc.

For example, in one aspect of the inventive subject matter the inventorscontemplate that a library or reference base for all cancer-relatedgenes, inflammation-related genes, DNA repair-related genes, and/orother non-disease related housekeeping genes can be created using one ormore omics data for each of those genes, and such library isparticularly useful where the omics data are associated with one or morehealth parameter. Viewed from a different perspective, while traditionalmethods of determining cancer prognosis or predicting treatment outcomehave been based on a few number of genes, such library can provide atool to generate a large cross-sectional database for all cancer-relatedgene activity, inflammation-related gene activity, DNA repair geneactivity and housekeeping gene activity (as a control). The largecross-sectional database can be a basis for generating a cancer matrix,based on which a prognosis of a cancer, a health status of the patient,a likelihood of outcome of treatment, an effectiveness of the treatmentcan be more reliably calculated.

Of course, it should be appreciated that analyses presented herein maybe performed over specific and diverse populations to so obtainreference values for the specific populations, such as across varioushealth associated states (e.g., healthy, diagnosed with a specificdisease and/or disease state, which may or may not be inherited, orwhich may or may not be associated with impaired DNA repair,inflammation-related autoimmunity, etc.), a specific age or age bracket,a specific ethnic group that may or may not be associated with frequentoccurrence of specific type of cancer. Of course, populations may alsobe enlisted from databases with known omics information, and especiallypublically available omics information from cancer patients (e.g., TCGA,COSMIC, etc.) and proprietary databases from a large variety ofindividuals that may be healthy or diagnosed with a disease. Likewise,it should be appreciated that the population records may also be indexedover time for the same individual or group of individuals, whichadvantageously allows detection of shifts or changes in the genes andpathways associated with different types of cancers.

In further particularly preferred aspects, it is contemplated that acancer score can be established for one or more cancer-related genes,inflammation-related genes, a DNA-repair gene, a neoepitope, and a genenot associated with a disease and that the score may be reflective of oreven prognostic for various types of cancer that are at least in partdue to mutations in cancer-related genes and/or pathways. For example,especially suitable cancer scores may involve scores for one or moregenes associated with one or more types of cancer (e.g., BRCA1, BRCA2,P53, etc.) relative to another gene that may or may not be associatedwith one type of cancer (e.g., housekeeping genes, etc.). In anotherexample, contemplated cancer scores may involve scores for one or moregenes associated with one or more types of one or more types of cancer(e.g., BRCA1, BRCA2, P53, etc.) relative to an overall mutation rate(e.g., mutation rate of the genes not associated with a disease, etc.)to so better identify cancer relevant mutations over ‘background’mutations.

Additionally, the omics data may be used to generate a general errorstatus for an individual (or tumor within an individual), or toassociate the number and/or type of alterations in cancer-related genes,inflammation-related genes, or a DNA-repair gene to identify a ‘tippingpoint’ for one or more gene mutations after which a general mutationrate skyrockets. For example, where a rate or number of mutations inERCC1 and other DNA repair genes could have only minor systemicconsequence, addition of further mutations to TP53 may result in acatastrophic increase in mutation rates. Thus, and viewed from adifferent perspective, mutations in the genes associated with DNA may beused to estimate the risk of occurrence for a DNA damage-based disease,and especially cancer and age-related diseases. In still furthercontemplated uses, so obtained omics information may be analyzed in oneor more pathway analysis algorithms (e.g., PARADIGM) to so identifyaffected pathways and to so possibly adjust treatment where treatmentemploys DNA damaging agents. Pathway analysis algorithms may also beused to in silico modulate expression of one or more DNA repair genes,which may results in desirable or even unexpected in silico treatmentoutcomes, which may be translated into the clinic.

With respect to calculation, the inventors contemplate that the cancerscore is typically a compound score reflecting status of a plurality ofgenes. For example, the cancer score can be calculated by counting anymutations (e.g., deletion, missense, nonsense, etc.) of anycancer-related genes, inflammation-related genes, and DNA-repair geneswith one or more mutations as having a positive value, counting anychanges in methylation or other modifications in DNA of counting anycancer-related genes, DNA-repair genes, counting any upregulation ordownregulation in expression levels of RNA of any cancer-related genes,inflammation-related genes, and DNA-repair genes, counting any presenceof tumor-specific, patient specific neoepitopes, counting any changes orratios in RNA isotypes (splice variants) of counting any cancer-relatedgenes and DNA-repair genes, and counting any changes in length of poly Atail of any cancer-related genes, inflammation-related genes, andDNA-repair genes.

The inventors further contemplate that each count may be weigheduniformly or biased, based on the significance of each count and then beassigned a value according to the weight of each count (e.g., each countcorresponds to 1 point, some counts correspond to different scores suchas 1 point, 3 points, 10 points, 100 points, etc.). Some mutations insome cancer related genes may be ‘leading indicators’ or triggers toactivate other tumorigenesis mechanism or metastasis. Identification ofsuch triggers may advantageously allow for early diagnosis orintervention of the cancer. Thus, for example, a mutation in acancer-specific gene among cancer-related genes, inflammation-relatedgenes, or DNA-repair genes may be weighed higher than othercancer-related genes or DNA-repair genes (e.g., at least 3 times, atleast 5 times, at least 10 times, at least 100 times, etc.) and can beassigned to higher values accordingly. As used herein thecancer-specific gene refers any gene or mutation of the gene that is aknown genetic disposition (e.g., significantly increase a susceptibilityto the disease) of specific types of cancer (e.g., BRCA1 and BRCA2 forbreast cancer and ovarian cancer, etc.). In another example, each genein any cancer-related pathway or DNA-repair pathway may be differentlyweighed (e.g., most significant, significant, moderate, lesssignificant, insignificant, etc.) and any mutation of a such gene thathas any or no impact (e.g., adversely affect the pathway stream, etc.)on any cancer-related pathway or DNA-repair pathway may be weigheddifferently based on the significance of the impact. Thus, for example,gene A encoding a significant, unreplaceable protein A in a cancerpathway may be weighed heavier than another gene B encoding a redundantprotein (replaceable with other proteins). Also, a nonsense mutation ingene A that results in nonfunctional protein may be weighed at least 3times, at least 5 times, at least 10 times, at least 100 times than asilent mutation in gene A or a missense mutation which does not affectthe function of protein A and can be assigned to higher valuesaccordingly.

In some embodiments, some countings may weigh equally or differentlybased on the significance of each counting and then be assigned to anegative value according to the weight of each counting (e.g., eachcounting corresponds to −1 point, some countings correspond to differentscores such as −1 point, −3 points, −10 points, −100 points, etc.). Forexample, upregulation of mRNA of gene C, which can compensate the lossof function of gene A, can be assigned to a negative value (e.g., −10points) such that it can compensate the positive value of mutation ofgene A (e.g., +10 points).

It is also contemplated that some countings may be differently weighedbased on the degree of changes in expression level of some RNAs. Forexample, when the expression level of RNA “X” increases at least twice,at least 5 times, at least 10 times, at least 20 times, while other RNAexpression level change is below 50% at best, then the increase ofexpression level of RNA “X” may be weighed at least 3 times, at least 5times, at least 10 times, at least 100 times than other genes.

Most typically, the cancer score is compound score that is a total sumof all values assigned to all counts. In some embodiments, the cancerscore can be a total sum of all values assigned to all counts (all omicsdata). In other embodiments, the cancer score can be a total sum of aselected number of values assigned to some counts (e.g., correspondingto specific pathways, specific types of genes, specific groups ofmechanisms, etc.). Thus, the cancer score increases as morecancer-related genes or DNA-repair genes possess one or more mutations.In some embodiments, each mutation and/or change may be countedseparately such that cancer scores may further increase where one ormore cancer-related genes or DNA-repair genes show multiple mutations ina single gene. In other embodiments, cancer score may further increasewhen such multiple mutations in a single gene may further affect thefunction of the cancer-related genes or DNA-repair genes such that themultiple mutations drive the cells more cancer-prone, or more cancerous,or drive the cancer microenvironment more immune-resistant, and so on.

Alternatively or additionally, the cancer score can be presented as atrajectory with one or more counts as its vectors, where a few numbersof variables and/or factors dominantly govern in determination of cancerprognosis. Each of variables and/or factors can be presented as avector, whose amplitude is corresponding to the point of each weightedcounting, and the addition of those vectors provides a trajectoryindicating the prognosis of the disease. Viewed form a differentperspective, it should be appreciated that multiple analyses over timecan be prepared for the same patient, and that changes over time (e.g.,with or without treatment) may be assigned specific values that will yetagain generate a time-dependent score. Such scores or changes over timemay be classified and serve as leading indicator for treatment outcome,drug response, etc.

Additionally, it is also contemplated that the cancer score can becalculated with health information other than cf/ct nucleic acid dataobtained from the patient's blood. For example, the health informationmay include expression levels/concentrations of several types ofcytokines (e.g., IL-2, TNF-a, etc.) related totumorigenesis/inflammation/immune response against the tumor, hormonelevels (e.g., estrogen, progesterone, growth hormone, etc.), blood sugarlevel, alanine transaminase level (for liver function), creatine level(for kidney function), blood pressure, types and quantity of tumorcell-secreted proteins (e.g., soluble ligands of immune cell receptor,etc.) or foreign antigenic proteins (e.g., for virus or bacterialinfection, etc.).

The inventors contemplated that the so obtained cancer score can be usedto provide a diagnosis of cancer or risk of having or developing acancer. In some embodiments, the calculated cancer score of a patientcan be compared with an average cancer score of healthy individuals todetermine the difference between two scores. Preferably, when thedifference between two scores is above a threshold value, the patientmay be diagnosed to have a tumor, or has a high risk to have a tumor. Inother embodiments, the calculated cancer score of a patient can becompared with a predetermined threshold score. The predeterminedthreshold score can be a predetermined score, which may vary dependingon patient's ethnicity, age, gender, or other health status. In otherembodiments, the predetermined threshold score can a dynamic score thatcan be changed based on a previous cancer score and a diagnosis ortreatment performed to the patient.

The inventors also contemplate that the so obtained cancer score can beused to provide a prognosis of the cancer. For example, the cancerscores can be calculated based on omics data obtained in month 1, month3, month 6, and month 12 after the patient got diagnosed with a firststage of lung cancer, and each cancer score can be compared with apredetermined threshold score corresponding to the month 1, 3, 6, and12. The cancer scores are about 120% of the threshold score in month 1and 3, and the cancer score is about 180% in month 6, and 230% of thethreshold score month 12. Such progress indicates that the prognosis ofthe lung cancer of the patient is not optimistic if the progress is notintervened. In another example, the cancer score can be calculated byhighly weighing the presence of neoepitopes that are tumor-specific andpatient-specific. In this example, the cancer scores can be calculatedbased on omics data obtained in month 1, month 3, month 6, and month 12after the patient got diagnosed with a first stage of lung cancer, andeach cancer score is calculated by highly weighing thepresence/appearance of new epitope that is tumor/tissue specific. Thecancer scores are about 120% of the threshold score in month 1 and 3,and the cancer score is about 140% in month 6, and 230% of the thresholdscore month 12. Such progress indicates a possible metastasis of thetumor to another organ (releasing different type of neoepitope) ordevelopment of different type of tumor in the same organ (releasingdifferent type of neoepitope).

In a further example, the cancer scores can provide an indicator fortreatment options. The treatment option may be a prophylactic treatmentwhere the compound score is below the threshold value, indicating thatthe patient is unlikely to have a tumor for now or at least has low riskof developing a tumor. When the cancer score is above the thresholdvalue and a majority portion of the cancer score highly weighted wasoverexpression of a cancer-related gene A (e.g., over a threshold suchas at least 10%, at least 20%, at least 30%, at least 50%, etc.), thenthe cancer score can be used to provide the treatment option that mayuse a drug inhibiting the activity of cancer-related gene A (e.g., ablocker of protein A, etc.). Similarly, when the cancer score is abovethe threshold value and a majority portion of the cancer score highlyweighted was overexpression of a gene encoding a receptor of an immunecell or a ligand of the receptor, then the cancer score can be used toprovide the immunotherapy using the receptor or ligand of the immunecells. Also, when the cancer score is above the threshold value and amajority portion of the cancer score highly weighted was overexpressionof a specific neoepitope, then the cancer score can be used to providethe immunotherapy using the neoepitope as a bait or a surgery/aradiation therapy to physically remove local tumors. Also such cancerscores may be an indicative of likelihood of success for the treatmentoption. However, if the portion of the cancer score highly weighted wasoverexpression of a cancer-related gene A is below the threshold, thenthe treatment option using a drug inhibiting the activity ofcancer-related gene A may be predicted less effective.

Consequently, the patient can be treated with at least one of thetreatment options based on the patient's cancer (compound) score. Forexample, above the threshold value and a majority portion of the cancerscore highly weighted was overexpression of a specific neoepitope, thetreatment option can be selected to include a recombinant virus (oryeast or bacteria) comprising a nucleic acid encoding the specificneoepitope. Then, the recombinant virus can be administered to thepatient in a dose and schedule effective to treat the tumor and/oreffective to reduce the cancer score of the patient for at least 10%, atleast 20%, at least 30%, at least in 2 weeks, at least in 4 weeks, atleast in 8 weeks, at least in 12 weeks after the administration or aseries of administrations.

It is also contemplated that the patient's cancer score can be comparedwith one or more other patients having same type of cancer and having atreatment history to provide a treatment option and predicted outcome.For example, where other patients' history indicates that the drugtreatment is effective only when the cancer score is below 200 (asabsolute score), or less than 180% of the healthy individual's score,and the patient's cancer score has been increasing from 140 to 160 forthe last 2 weeks, a recommendation to proceed with drug treatment nolater than 2 weeks can be provided based on the other patients' historyand cancer scores.

The calculated cancer score can also be an indicator of an effectivenessof a cancer treatment, especially when the omics data includesinformation of at least one or more genes encoding a target/indicator ofthe cancer treatment. For example, cancer scores can be calculated basedon omics data obtained before the cancer treatment, 7 days after, 2weeks, 1 month, and 6 months of the cancer treatment. The cancer scoreof 7 days after the treatment is 80% of the cancer score before thetreatment, and the cancer score of 2 weeks and 1 month after thetreatment is 50% of the cancer score before the treatment, and thecancer score of 6 months after the treatment is 150% of the cancer scorebefore the treatment. Such progress indicates that the treatment waseffective at least for a short term (e.g., up to 1 month), yet theeffectiveness is decreased over time and may not effective at all in 6months after the treatment. In some embodiments, the cancer scoresbefore and after treatment can be compared with a predeterminedthreshold value to determine the effectiveness of the treatment. Forexample, if the cancer score is 200 before the treatment and 130 afterthe treatment where the threshold cancer score is 100, then thetreatment can be determined “effective” as the cancer score drops belowthe threshold after the treatment. However, if the cancer score is 200before the treatment and 160 after the treatment where the thresholdcancer score is 150, then the treatment can be determined “noteffective” as the cancer score stays above the threshold after thetreatment even though the absolute value of the cancer score isdecreased. Consequently, the inventors further contemplate that thepatient continues with administering the treatment option (e.g., immunetherapy, etc.) when the treatment can be determined “effective”, whenthe cancer score after the treatment is lower than the predeterminedthreshold, when the cancer score after the treatment is at most 5%, atmost 10% higher than the predetermined threshold, or when the cancerscore after the treatment is at least 5%, at least 10%, at least 15%lower than the predetermined threshold. s

The inventors also contemplate that the effectives of some cancertreatments can be determined by analyzing omics data including foreignDNA/RNA originated from a carrier of the immune therapy (e.g., virus,bacteria, yeast, etc.). For example, where the virus is a carrier todeliver a recombinant nucleic acid encoding recombinant killeractivation receptor (KAR), the level of cell free DNA/RNA of recombinantKAR in the patient blood can be an indicator of an effectiveness ofinfection of the virus.

It should be apparent to those skilled in the art that many moremodifications besides those already described are possible withoutdeparting from the inventive concepts herein. The inventive subjectmatter, therefore, is not to be restricted except in the scope of theappended claims. Moreover, in interpreting both the specification andthe claims, all terms should be interpreted in the broadest possiblemanner consistent with the context. In particular, the terms “comprises”and “comprising” should be interpreted as referring to elements,components, or steps in a non-exclusive manner, indicating that thereferenced elements, components, or steps may be present, or utilized,or combined with other elements, components, or steps that are notexpressly referenced. As used in the description herein and throughoutthe claims that follow, the meaning of “a,” “an,” and “the” includesplural reference unless the context clearly dictates otherwise. Also, asused in the description herein, the meaning of “in” includes “in” and“on” unless the context clearly dictates otherwise. Where thespecification claims refers to at least one of something selected fromthe group consisting of A, B, C . . . and N, the text should beinterpreted as requiring only one element from the group, not A plus N,or B plus N, etc.

What is claimed is:
 1. A method of selecting a treatment option for acancer patient, comprising: obtaining blood from a patient having acancer; obtaining from the blood omics data of the cancer patient for aplurality of cancer genes, wherein the omics data comprise at least oneof DNA sequence data, RNA sequence data, and RNA expression level;wherein the omics data comprise an expression level of a cancer relatedgene, an expression level of an immune therapy related gene, and anexpression level of a DNA or RNA sequence encoding a neoepitope;providing an omics record computer system that includes at least oneprocessor and at least one computer readable memory coupled to theprocessor and configured to digitally store the omics data for theplurality of cancer-related genes in the at least one memory; analyzing,in silico, the omics data to generate a digital cancer gene score,wherein the digital cancer gene score is calculated in silico using theomics data; and administering, when (1) the digital cancer gene scoreexceeds a threshold level and (2) the majority portion of the digitalcancer gene score is weighted for the cancer related gene, the immunetherapy related gene, or the DNA or RNA sequence encoding a neoepitope,a therapeutic agent that targets the cancer related gene, the immunetherapy related gene, or the DNA or RNA sequence encoding theneoepitope.
 2. The method of claim 1, wherein the blood omics data areobtained from DNA/RNA that is enclosed in a vesicular structure or boundto non-nucleotide molecule.
 3. The method of claim 1, wherein the bloodomics data are obtained from cell-free DNA or cell-free RNA.
 4. Themethod of claim 1, wherein the DNA sequence data are selected from thegroup consisting of mutation data, copy number data duplication, loss ofheterozygosity data, and epigenetic status.
 5. The method of claim 1,wherein the RNA sequence data are selected from the group consisting ofmRNA sequence data and splice variant data.
 6. The method of claim 1,wherein the RNA expression level data are selected from the groupconsisting of a quantity of RNA transcript and a quantity of a smallnoncoding RNA.
 7. The method of claim 1, wherein DNA sequence data areobtained from circulating free DNA.
 8. The method of claim 1, whereinthe RNA sequence data are obtained from the group consisting ofcirculating tumor RNA and circulating free RNA.
 9. The method of claim1, wherein the cancer related gene is selected form the group consistingof a gene encoding a protein kinase, a cancer-specific gene, a cancerassociated gene, a DNA polymerase gene, a nuclease gene, a replicatedassociated gene,
 10. The method of claim 1, wherein the immune therapyrelated gene is selected form the group consisting of a DNA repair gene,an RNA repair gene, an inflammation-related gene, a chemokine gene, acytokine gene, a chemokine receptor gene, a cytokine receptor gene, ahomologous recombination gene, non-homologous end joining gene,
 11. Themethod of claim 1, wherein the DNA or RNA sequence encoding theneoepitope is a DNA or RNA sequence that encodes a patient-and tumorspecific neoepitope, is a sequence that encodes a tumor type-specificneoepitope, or is a sequence that encodes a cancer-associatedneoepitope.
 12. The method of claim 1, wherein the neoepitope is matchedto the patient's HLA type.
 13. The method of claim 1, wherein the cancerrelated gene or the immune therapy related gene is selected form thegroup of genes presented in Table 1, Table 2, and Table
 3. 14. Themethod of claim 1, further comprising a step of processing the omicsdata using pathway analysis to thereby identify one or more pathwaysaffected by the expression level.
 15. The method of claim 1, wherein thecancer score is a compound score reflecting status of the cancer relatedgene, the immune therapy related gene, and the DNA or RNA sequenceencoding a neoepitope.
 16. The method of claim 1, wherein thetherapeutic agent is an enzyme inhibitor, a receptor ligand.
 17. Themethod of claim 1, wherein the therapeutic agent is an immune therapy.18. The method of claim 1, wherein the treatment option is surgery orradiation therapy.